Aim

18,000 TF reporters were transfected into HepG2 and K562 cells (in total 6 different conditions), sequencing data yielded barcode counts of these experiments. These counts will be processed in this script.


Setup

Libraries

Functions

Loading data

Creating count data frames

Compare differently clustered pDNA data

Annotate data frame

Data quality plots

##            barcode    tf promoter
## 1246  TCCATCTAAGCG Trp53     mCMV
## 1768  CCACGATAGTAA Trp53     hBGm
## 2178  TGGTGAGCAATT Trp53     mCMV
## 2246  AGCAATAAGCCA Trp53     mCMV
## 3361  CGCCACAGTGTA Trp53     mCMV
## 3541  AACAGGCGGTCT Trp53     mCMV
## 3547  GCTCTCGAATTA Trp53     mCMV
## 3703  GTCTGTTCGATT Trp53     mCMV
## 3797  CTATTCTGTGCA Trp53     mCMV
## 4358  AACAGAACAAGG Trp53     mCMV
## 5109  TGGTGTAGCAGA Trp53     mCMV
## 5243  CAGCGACTGACA Trp53     mCMV
## 5305  CGATATAGCCAG Trp53     mCMV
## 5396  GCTCTACACGAG Trp53     mCMV
## 5655  CAAGGTATAGGA Trp53     mCMV
## 6927  TCGACCGGATTG Trp53     mCMV
## 7270  TCTCTCAGCGAA Trp53     hBGm
## 7359  ATCACTGTATCC Trp53     mCMV
## 7509  TGTAAGGTAGAG Trp53     mCMV
## 7883  TGGATTGAGACA Trp53     mCMV
## 8326  TCCGTAGCTTCC Trp53     mCMV
## 8358  TGTTCCTGTCTG Trp53     hBGm
## 9777  GAATAGACGGTG Trp53     mCMV
## 10215 CTTGACTCGCTG Trp53     mCMV
## 10606 TACAGCGTTACC Trp53     hBGm
## 10763 ATACTCGTTCGC Trp53     mCMV
## 11459 ACTCCGTATTGC Trp53     mCMV
## 11820 GTTCGAAGCTCA Trp53     mCMV
## 12698 CGTTCGGAGAGA Trp53     mCMV
## 12939 ACCTCTTAGGTC Trp53     mCMV
## 13232 CGTTGAATATCG Trp53     mCMV
## 13322 CGTGTACACAGC Trp53     mCMV
## 13349 CGTTGATACTCA Trp53     hBGm
## 13403 AGTCTACCTTCG Trp53     minP
## 13720 GGTCGCTCCATT Trp53     hBGm
## 13742 ACGCACGAGATT Trp53     mCMV
## 13887 CATTCCGCTTGT Trp53     minP
## 14349 TGGTGCACTGTC Trp53     mCMV
## 15447 ACACGTAGCTTA Trp53     mCMV
## 15666 CGTCCTGTAAGA Trp53     mCMV
## 15699 CGCGTCCATCAA Trp53     mCMV
## 15891 AAGCCTGACAAG Trp53     mCMV
## 16430 TCTCCTCTGTGG Trp53     mCMV
## 16926 CACCAAGGCTGA Trp53     mCMV
## 18859 TACGAACTGTGT Trp53     mCMV
## 18869 TCAGTCTACTCT Trp53     mCMV
## 19853 CACCGGAGATCT Trp53     mCMV
## 20265 ATAGAGGCTGAC Trp53     mCMV
## 20648 GGATCATAGTTC Trp53     mCMV
## 20737 GACTATCTAGCA Trp53     mCMV
## 21178 TCTACTGAGTGA Trp53     mCMV
## 21258 GAGCAGGATTGT Trp53     mCMV
## 21327 GCTGTGCAATCG Trp53     mCMV
## 22837 TGTCCGTAACTC Trp53     mCMV
## 23142 TGCTATTAGACC Trp53     mCMV
## 23540 AGTCTATTGGTG Trp53     mCMV
## 23913 CACGCGTCTGAT Trp53     mCMV
## 24108 CGGCCTCGTATT Trp53     hBGm
## 24967 CTACACACCGCT Trp53     mCMV
## 25028 AGGCGCTTAGGT Trp53     mCMV
## 25182 TGGACGCTCTTG Trp53     mCMV
## 25873 TCGTGTACTCCG Trp53     hBGm
## 26543 TCGCAAGTCAGT Trp53     mCMV
## 27779 CCACATCGGAGA Trp53     hBGm


Sample filtering


Normalization of barcode counts:

Divide cDNA barcode counts through pDNA barcode counts to get activity


Calculate mean activity - filter out outlier barcodes

Scaling data


Re-labeling rep 3 K562 sample

bc_df_cDNA_filt$sample[bc_df_cDNA_filt$sample == "R3_K562_DMSO"] <- "K562_stressed"
bc_df_cDNA_filt$condition[bc_df_cDNA_filt$sample == "K562_stressed"] <- "K562_stressed"

Calculate correlations between technical replicates

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'


Correlation between replicates

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

Session Info

paste("Run time: ",format(Sys.time()-StartTime))
## [1] "Run time:  6.979267 mins"
getwd()
## [1] "/DATA/usr/m.trauernicht/projects/SuRE-TF/gen2"
date()
## [1] "Fri Jul  9 15:03:38 2021"
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] pheatmap_1.0.12             PCAtools_2.2.0             
##  [3] ggrepel_0.9.1               DESeq2_1.30.1              
##  [5] SummarizedExperiment_1.20.0 Biobase_2.50.0             
##  [7] MatrixGenerics_1.2.1        matrixStats_0.58.0         
##  [9] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
## [11] IRanges_2.24.1              S4Vectors_0.28.1           
## [13] BiocGenerics_0.36.1         viridis_0.6.0              
## [15] viridisLite_0.4.0           grr_0.9.5                  
## [17] tidyr_1.1.3                 LncFinder_1.1.4            
## [19] gridExtra_2.3               RColorBrewer_1.1-2         
## [21] readr_1.4.0                 haven_2.4.1                
## [23] ggbeeswarm_0.6.0            plotly_4.9.3               
## [25] tibble_3.1.1                dplyr_1.0.5                
## [27] vwr_0.3.0                   latticeExtra_0.6-29        
## [29] lattice_0.20-41             stringdist_0.9.6.3         
## [31] GGally_2.1.1                ggpubr_0.4.0               
## [33] ggplot2_3.3.3               stringr_1.4.0              
## [35] plyr_1.8.6                  data.table_1.14.0          
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1              backports_1.2.1          
##   [3] lazyeval_0.2.2            splines_4.0.5            
##   [5] crosstalk_1.1.1           BiocParallel_1.24.1      
##   [7] digest_0.6.27             foreach_1.5.1            
##   [9] htmltools_0.5.1.1         fansi_0.4.2              
##  [11] memoise_2.0.0             magrittr_2.0.1           
##  [13] openxlsx_4.2.3            recipes_0.1.16           
##  [15] annotate_1.68.0           gower_0.2.2              
##  [17] prettyunits_1.1.1         jpeg_0.1-8.1             
##  [19] colorspace_2.0-0          blob_1.2.1               
##  [21] xfun_0.22                 crayon_1.4.1             
##  [23] RCurl_1.98-1.3            jsonlite_1.7.2           
##  [25] genefilter_1.72.1         survival_3.2-10          
##  [27] iterators_1.0.13          glue_1.4.2               
##  [29] gtable_0.3.0              ipred_0.9-11             
##  [31] zlibbioc_1.36.0           XVector_0.30.0           
##  [33] seqinr_4.2-5              DelayedArray_0.16.3      
##  [35] BiocSingular_1.6.0        car_3.0-10               
##  [37] abind_1.4-5               scales_1.1.1             
##  [39] DBI_1.1.1                 rstatix_0.7.0            
##  [41] Rcpp_1.0.6                xtable_1.8-4             
##  [43] progress_1.2.2            dqrng_0.3.0              
##  [45] rsvd_1.0.5                foreign_0.8-81           
##  [47] bit_4.0.4                 proxy_0.4-25             
##  [49] lava_1.6.9                prodlim_2019.11.13       
##  [51] htmlwidgets_1.5.3         httr_1.4.2               
##  [53] ellipsis_0.3.2            farver_2.1.0             
##  [55] pkgconfig_2.0.3           reshape_0.8.8            
##  [57] XML_3.99-0.6              nnet_7.3-15              
##  [59] sass_0.3.1                locfit_1.5-9.4           
##  [61] utf8_1.2.1                caret_6.0-86             
##  [63] labeling_0.4.2            tidyselect_1.1.1         
##  [65] rlang_0.4.10              reshape2_1.4.4           
##  [67] AnnotationDbi_1.52.0      cachem_1.0.4             
##  [69] munsell_0.5.0             cellranger_1.1.0         
##  [71] tools_4.0.5               generics_0.1.0           
##  [73] RSQLite_2.2.7             ade4_1.7-16              
##  [75] broom_0.7.6               fastmap_1.1.0            
##  [77] evaluate_0.14             yaml_2.2.1               
##  [79] ModelMetrics_1.2.2.2      knitr_1.33               
##  [81] bit64_4.0.5               zip_2.1.1                
##  [83] purrr_0.3.4               sparseMatrixStats_1.2.1  
##  [85] nlme_3.1-152              compiler_4.0.5           
##  [87] beeswarm_0.3.1            curl_4.3                 
##  [89] png_0.1-7                 e1071_1.7-6              
##  [91] ggsignif_0.6.1            geneplotter_1.68.0       
##  [93] bslib_0.2.4               stringi_1.5.3            
##  [95] highr_0.9                 forcats_0.5.1            
##  [97] Matrix_1.3-2              vctrs_0.3.8              
##  [99] pillar_1.6.0              lifecycle_1.0.0          
## [101] jquerylib_0.1.4           irlba_2.3.3              
## [103] cowplot_1.1.1             bitops_1.0-7             
## [105] R6_2.5.0                  rio_0.5.26               
## [107] vipor_0.4.5               codetools_0.2-18         
## [109] MASS_7.3-53.1             assertthat_0.2.1         
## [111] withr_2.4.2               GenomeInfoDbData_1.2.4   
## [113] mgcv_1.8-34               hms_1.0.0                
## [115] beachmat_2.6.4            grid_4.0.5               
## [117] rpart_4.1-15              timeDate_3043.102        
## [119] class_7.3-18              DelayedMatrixStats_1.12.3
## [121] rmarkdown_2.7             carData_3.0-4            
## [123] pROC_1.17.0.1             lubridate_1.7.10